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ABSTRACT 


Tuberculosis (TB) is one of the deadliest infectious disease in the world. 
TB is caused by a type of tubercle bacillus called Mycobacterium 
Tuberculosis. Early detection of TB is pivotal to decrease the morbidity and 
mortality. TB is diagnosed by using the chest x-ray and a sputum test. 
Challenges for radiologists are to avoid confused and misdiagnose TB and 


lung cancer because they mimic each other. Semi-automated TB detection 

using machine learning found in the literature requires identification of 
Keywords: objects of interest. The similarity of tissues, veins and small nodules 
presenting the image at the initial stage may hamper the detection. In this 
paper, an approach to detect TB, that does not require segmentation of 
objects of interest, based on ensemble deep learning, is presented. Evaluation 
on publicly available datasets show that the proposed approach produced a 
model that recorded the best accuracy, sensitivity and specificity of 91.0%, 
89.6% and 90.7% respectively. 
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1, INTRODUCTION 

Tuberculosis (TB) is caused by the bacteria Mycobacterium tuberculosis, and it most often affects 
the lungs, it is a potentially serious infectious disease. Currently, in the global 2018 tuberculosis report, 
the World Health Organization (WHO) estimated that 1.6 million people being infected by TB died (1.2-1.4 
million HIV-negative and 0.3 million HIV-positive) and 10.0 million had fallen ill [1]. Amongst HIV 
patients, TB is the leading cause of death [1]. 

Tuberculosis diagnosed mostly using the skin test, chest x-ray and a sputum test [2]. X-rays (CXR) 
can detect TB to a certain extent, but it cannot guarantee whether the patient has TB or some other infection 
due to difficulties in identifying malignant [3]. Nevertheless, CXR has become an important tool to detect TB 
due to the increased availability of radiography, in particular digital radiography, with better image quality 
and safety [4]. Early detection of lung related diseases such tuberculosis is pivotal to decrease the morbidity 
and mortality [5]. However, in Malaysia most lung cancer and tuberculosis cases are diagnosed late hence 
reduces chance of survival [6]. For century, challenges for radiologists to avoid confused and misdiagnose 
tuberculosis to other lung related diseases because they mimic each other [7]. A semi-automated system to 
effectively classify pulmonary nodules with low false positive rate is deemed necessary to assist radiologist 
to screen the chest radiograph images [8]. 

The application of machine learning and image processing approaches on medical images have been 
widely used, a comprehensive review is presented in [9] and some examples can be found in [10, 11]. 
Those work have produced superior classification performance compared to conventional approach. With 
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respect to semi-automated tuberculosis detection, examples of previous work have been reported 
in [9, 12, 13]. Two sets of features, in the form of object detection based and content based features, were 
extracted from the images with Support Vector Machine (SVM) was employed as classifier [9]. The best 
accuracy of 84.1% was recorded when applied on the Shenzhen dataset. Another work that applied image 
registration before object based features were extracted to detect tuberculosis recorded the best accuracy of 
60% [12]. In [13], different types of features and analysis were employed to improve the classification result. 

Recently, deep learning architectures have been employed to learn features from the medical 
images [14]. Deep learning is an approach that applies deep architectures that include many layers of non- 
linear information processing to learn features from data [15]. The deep learning techniques are categorized 
into two; deep discriminative network such as Convolution Neural Network (CNN) and unsupervised 
method such as deep belief network (DBN). Deep learning computes hierarchical features from images. The 
higher level features are defined from lower level features. Its ability to learn features at the higher level has 
been shown to produce better results. Examples of work that employs deep learning, in particular CNN, to 
detect tuberculosis can be found in [16] and [17]. Using a customized CNN produced the best accuracy of 
82.09% [16]. In [17], an optimized CNN architecture was proposed where they manage to reduce the amount 
of computation without sacrificing the performance. The best accuracy reported was 79.0% and 84.4% for 
Montgomery and Shenzhen datasets respectively. 

In machine learning domain, ensemble based classifiers are widely used to build predictive model 
by integrating multiple models [18]. There are a number of different ensemble techniques, the most used are 
bagging and boosting. The bagging method considers various outputs of learned classifiers into a single 
prediction by means of voting while the boosting approach repeatedly running a weak learner on different set 
of training data and then combines into a single strong classifier. To the best of our knowledge, there are 
limited number of previous work consider ensemble of deep learning architectures to learn features for TB 
detection, some examples can be found in [19-21]. However, the work presented limited at the pre-trained 
CNN (GoogLenet, ResNet and VggNet) features only [19], where the best accuracy recorded was 84.7% on 
the Shenzhen dataset. The work presented in [20] employs AlexNet and GoogleNet with excessive 
augmentation (including radiologists’ intervention) and recorded an Area Under Curve (AUC) of 99% using 
pre-trained models. An additional convolutional layer for feature extraction on top of the basic CNN was 
proposed in [21] whereby an accuracy of 84% was recorded for Shenzhen dataset. 

Based on the literature, more work is required to achieve better TB detection rate, in its early stage 
in particular, with minimal human intervention. In this paper, an approach to ensemble different architectures 
of deep learning for tuberculosis detection is presented. The contribution of this paper is as follow: 

a) Three different deep learning architectures were employed to perform tuberculosis detection. 

b) Ensemble technique in the form of soft voting used to combine the different deep learning architectures 
employed in tuberculosis detection. 

The proposed ensemble deep learning, in the form of CNN, for tuberculosis detection is presented in Section 

2. Section 3 describes the experimental setup and discussion of results. Section 4 concludes this paper. 


2. THE PROPOSED APPROACH 

In this paper, we present an approach to detect TB using ensemble deep learning classifiers. Figure 1 
shows the framework of TB detection using ensemble deep learning classifiers. It has three main modules: 
(1) image pre-processing, (11) classifier generation, and (111) ensemble classification. These modules were 
implemented across two stages; the training stage for deep learning model generation, and the test stage to 
perform TB detection. Details of each of the modules described above, with respect to the proposed 
approach, are presented in Sub-section 2.1 to 2.3 below. 
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Figure 1. The proposed deep learning based TB detection approach 


2.1. Image Pre-processing 

The images acquired were first pre-processed to retain only the Region Of Interest (ROI). Figure 2 
shows an example of CXR image. With respect to the work presented in this paper, the ROI is the lung area. 
To achieve this, a mask of the lung area was constructed for each CXR image. The constructed mask was 
then superimposed on the CXR image. Figure 3 shows an example of a masked image of Figure 2. 








> _— ' ; 
i f 4 ¥ 
Figure 2. Chest X-Ray image Figure 3. Chest X-Ray image with ROI identified 


2.2. CNN Classifier Generation 

In this paper, we employed three CNN architectures, InceptionV3, VGG16 and a custom-built 
architecture. CNN was selected due to its ability to extract and learn meaningful features on its own [8]. 
Further readings on InceptionV3 and VGGI16 can be found in [22] and [23] respectively. With respect to 
the custom-built CNN architecture, we reduced the number of layers to 15. By reducing the number of layers, 
less number of parameters are generated for training and hence less time 1s required to complete the training. 
Each of the CNN architecture described above generated its own model based on the pre-trained model and 
the dataset used in this paper. The generated models were then used to perform TB detection. 


2.3. Ensemble Classification 

Each of the generated CNN model, as described in the foregoing sub-section, performed TB 
detection individually. With an aim to produce better detection result, ensemble classification was employed. 
With respect to the work presented in this paper, a majority voting was used. To predict the final label of 
each test image (TB or non-TB), the prediction made by individual CNN model was considered collectively 
and label with the highest vote was selected as the final label. 
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2.4. Performance Measurement 

To measure the performance of the proposed approach, three measurement metrics were employed: 
sensitivity, specificity and accuracy. Sensitivity and specificity were used to measure how well the proposed 
approach correctly classify TB or non-TB cases. Accuracy was used to measure the overall performance of 
the proposed approach in detecting TB cases. 


3. EXPERIMENTAL SETUP AND DISCUSSION OF RESULTS 
The dataset used to evaluate the proposed work, the nature of the experiments conducted, 
and discussions of the results obtained are described in this section. 


3.1. The Dataset 

To evaluate the proposed approach presented in this paper, two public CXR images, Montgomery 
and Shenzhen datasets [24], are used. The Montgomery dataset consists of 80 normal images and 58 images 
with manifestations of TB. The size of the CXR 1s either 4020x4892 pixels or 4892x4020 pixels in Portable 
Network Graphics (PNG) format. The Shenzhen dataset consists of 662 images, where 336 are TB 
manifested CXRs. The size of the images are approximately 3000x3000 pixels. 


3.2. Experimental Setup 

The aim of the experiment conducted was to evaluate the performance of the proposed approach to 
TB detection using CXR images. To achieve this, two sets of experiments were conducted. The first was to 
evaluate the performance of using different CNN architectures to TB detection. The second was conducted to 
compare the performance of ensemble of classifiers used in experiment | with individual classifier. Ten-fold 
Cross Validation (TCV) was employed, whereby the dataset was randomly divided into ten equal sized 
subsets; and on each iteration one subset was used as the test set while the other was used as the training set. 
The images were resized to 300x300 pixels before they were fed to the CNN architectures for model 
generation. The size selected is similar to the work presented in [19] and [20]. With respect to the work 
presented in this paper, the Keras [25] implementation of InceptionV3 and VGGI16 were used. Both 
architectures were pre-trained using Imagenet dataset. 


3.3. Experimental Results 
The results of the experiments described above are presented in Sub-section 3.3.1 and 3.3.2 
respectively. 


3.3.1 Performance of Different CNN Architectures to TB Detection 

In this experiment, the images were classified as either TB or non-TB. For each CNN architecture, 
the learning rate and epoch were set to le-5 and 150 respectively. Sigmoid classifier was employed to train 
the model. Table | shows the results obtained. The best sensitivity, specificity and accuracy were recorded by 
the custom-built CNN architecture. 


Table 1. The Performance of Different CNN Architectures to TB Detection 


CNN Sensitivity Specificity Accuracy 
Architecture (%) (%) (%) 
Inception V3 80.7 90.2 85.0 

VGG16 81.7 90.3 85.6 
Custom 88.6 91.7 90.0 


3.3.2 Performance of Ensemble CNN to TB Detection 
For the second set of experiment, the individual CNN classifiers were ensemble to perform 
the classification. Table 2 show the result obtained. 


Table 2. The Performance of Ensemble CNN Architectures to TB detection 


CNN Sensitivity Specificity Accuracy 
Architecture (%) (%) (%) 
Ensemble 89.6 90.7 91.0 


3.4. Analysis of Results 
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Based on the results shown in Table 1 and 2, we observed that the proposed ensemble CNN 
produced an improved detection of TB cases (sensitivity) and overall accuracy by 1%. Although the 
specificity is decreased also by 1%, with respect to medical image analysis, sensitivity is deemed more 
pertinent as we do not want to misclassify positive case as negative. Further inspection on how the CNN 
classify the CXR into TB or non-TB, we found that the area that the network was focus on was both the 
upper lobes. Using the heat map overlay obtained from the fifth convolutional layer of Inception V3 classifier, 
Figure 4 shows examples where the dark purple background represents area that were not activated while the 
dark red was heavily activated. For future work, it would be recommended to focus only on the activated area 
for TB detection. 











Figure 4. Activation heatmaps of CXRs using Inception V3 


4. CONCLUSION 

This paper presented an approach to detect TB using ensemble deep learning based features and 
models. Three different CNN architectures were employed, InceptionV3, VGGI16 and custom-built 
architecture. The results obtained shows that, the ensemble of the three classifiers using majority voting, 
produced the best TB detection performance. Further inspection using InceptionV3 architecture, it is found 
that the upper lobes of the CXR were the most activated area which indicates the symptoms of TB could be 
found mostly in that area. Hence, future work may focus the identification of features only on those areas 
which could improve the detection performance with respect to accuracy and reducing processing time. 
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